{load-data , include=FALSE} knitr::opts_chunk$set(eval = TRUE)
© Nobel Prize Outreach. Photo: Hugh
Fox
In January 2017, BuzzFeed News published an article on why Nobel laureates show immigration is so important for American science. You can read the article here. In the article, they show that while most living Nobel laureates in the sciences are based in the US, many of them were born in other countries. This is one reason why scientific leaders say that immigration is vital for progress. In this lab, we will work with the data from this article to recreate some of their visualizations as well as explore new questions.
Complete the following steps before you join the live workshop!
You have two tasks you should complete before the workshop:
You should by now have a bit of experience with Git and Github, and so now let us deliberately make a merge conflict so we can have more practice fixing them.
Git will put conflict markers in your code that look like:
<<<<<<< HEAD
See also: [dplyr documentation](https://dplyr.tidyverse.org/)
=======
See also [ggplot2 documentation](https://ggplot2.tidyverse.org/)
>>>>>>> some1alpha2numeric3string4
The ===s separate your changes (top) from
their changes (bottom).
Note that on top you see the word HEAD, which indicates
that these are your changes.
And at the bottom you see some1alpha2numeric3string4
(well, it probably looks more like
28e7b2ceb39972085a0860892062810fb812a08f).
This is the hash (a unique identifier) of the commit your collaborator made with the conflicting change.
Your job is to reconcile the changes: edit the file so that
it incorporates the best of both versions and delete the
<<<, ===, and
>>> lines. Then you can stage and commit the
result.
Complete the following steps during the live workshop with your team.
Our goal is to see two different types of merges: first, we will see a type of merge where git cannot figure out what to do on its own (a merge conflict) and requires human intervention, then we will see another type of merge where git can figure out what to do without human intervention.
Doing this will require some tight choreography, so pay attention!
Take turns in completing the exercise, only one member at a time. Others should just watch, not doing anything on their own projects (this includes not even pulling changes!) until they are instructed to. If you feel like you will not be able to resist the urge to touch your computer when it is not your turn, we recommend putting your hands in your pockets or sitting on them!
Before starting: everyone should have the repo cloned and know which role number(s) they are.
Role 1:
author in the
YAML.🛑 Make sure the previous role has finished before moving on to the next step.
Role 2:
author in the YAML, type the team member names
but in a different order.🛑 Make sure the previous role has finished before moving on to the next step.
Role 3:
chunk1 to load-data
as it is more informative about what the code in the chunk is
doing.🛑 Make sure the previous role has finished before moving on to the next step.
Role 4:
chunk1 to
reading-nobel-data.🛑 Make sure the previous role has finished before moving on to the next step.
Everyone: Pull, and observe the changes in your document.
Before getting started with the Exercises, run the following code in the Console to load this package.
library(tidyverse)The dataset for this assignment can be found as a csv file in the
data folder of your repository. You can read it in using
the following.
nobel <- read_csv("data/nobel.csv")The variable descriptions are as follows:
id: ID numberfirstname: First name of laureatesurname: Surnameyear: Year prize woncategory: Category of prizeaffiliation: Affiliation of laureatecity: City of laureate in prize yearcountry: Country of laureate in prize yearborn_date: Birth date of laureatedied_date: Death date of laureategender: Gender of laureateborn_city: City where laureate was bornborn_country: Country where laureate was bornborn_country_code: Code of country where laureate was
borndied_city: City where laureate dieddied_country: Country where laureate dieddied_country_code: Code of country where laureate
diedoverall_motivation: Overall motivation for
recognitionshare: Number of other winners award is shared
withmotivation: Motivation for recognitionIn a few cases the name of the city/country changed after the prize
was given (e.g. in 1975 Bosnia and Herzegovina was called the Socialist
Federative Republic of Yugoslavia). In these cases the variables below
reflect a different name than their counterparts without the suffix
_original.
born_country_original: Original country where laureate
was bornborn_city_original: Original city where laureate was
borndied_country_original: Original country where laureate
dieddied_city_original: Original city where laureate
diedcity_original: Original city where laureate lived at
the time of winning the awardcountry_original: Original country where laureate lived
at the time of winning the awardTake turns answering the exercises.
There are some observations in this dataset that we will exclude from our analysis to match the BuzzFeed News results.
nobel_living that
filters forcountry is available"org" as their gender)died_date is
NA)Confirm that once you have filtered for these characteristics you are left with a data frame with 228 observations, once again using inline code.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
… says the BuzzFeed News article. Let us see if that is true.
First, we will create a new variable to identify whether the laureate
was in the US when they won their prize. We will use the
mutate() function for this. The following pipeline mutates
the nobel_living data frame by adding a new variable called
country_us. We use an if statement to create this
variable. The first argument in the if_else() function
we’re using to write this if statement is the condition we’re testing
for. If country is equal to "USA", we set
country_us to "USA". If not, we set the
country_us to "Other".
Note that we can
achieve the same result using the fct_other() function we
have seen before (i.e. with country_us = fct_other(country,
“USA”)). We decided to use the if_else() here to
show you one example of an if statement in R.
nobel_living <- nobel_living %>%
mutate(
country_us = if_else(country == "USA", "USA", "Other")
)Next, we will limit our analysis to only the following categories: Physics, Medicine, Chemistry, and Economics.
nobel_living_science <- nobel_living %>%
filter(category %in% c("Physics", "Medicine", "Chemistry", "Economics"))For the next exercise work with the nobel_living_science
data frame you created above. This means you will need to define this
data frame in your R Markdown document, even though the next exercise
does not explicitly ask you to do so.
Create a faceted bar plot visualizing the relationship between the category of prize and whether the laureate was in the US when they won the Nobel Prize. Interpret your visualization, and say a few words about whether the BuzzFeed News headline is supported by the data.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Aim to make it to this point during the workshop.
Hint:
You should be able to cheat borrow from code you used earlier
to create the country_us variable.
Create a new variable called born_country_us that
has the value "USA" if the laureate is born in the US and
”Otherotherwise. How many of the winners are born in the
US?
Add a second variable to your visualization from Exercise 3 based on whether the laureate was born in the US or not. Based on your visualization, do the data appear to support BuzzFeed News’s claim? Explain your reasoning in 1-2 sentences.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Note that your bar plot won’t exactly match the one from the BuzzFeed News article. This is likely because the data has been updated since the article was published. Do not worry if you cannot make the bar plot have all the bars in decreasing order of size—but you might like to try anyway!
count() function) for their birth country
(born_country) and arrange the resulting data frame in
descending order of number of observations for each country.
Note that we are now no longer restricting to only the four
science prizes. Which country is the most common? Make a
(horizontal) bar plot of the data.Hint: you will need to
go back to the nobel_living data frame. You may wish to use
mutate again to recreate the country_us and
born_country_us variables.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Now go back through your write up to make sure you have answered all questions and all of your R chunks are properly labelled.
Once you decide as a team that you are done with this lab, all members of the team should pull the changes and knit the R Markdown document to confirm that they can reproduce the report.
Team member 1 should now consider removing the other team members as collaborators to prevent them from making any further changes to your GitHub repository.
Team members 2+ should now take a clone of the lab worksheet from team member 1 so that you have your own personal copy of today’s exercises.
The plots in the BuzzFeed News article are called waffle plots. You can find the code used for making these plots in BuzzFeed News’s GitHub repo (yes, they have one!) here. You are not expected to recreate them as part of your assignment, but you are welcomed to do so for fun!